COVID-19 outbreak analysis in Germany with R

The conducted analysis follows the posts by Tim Churches, starting with his post in February 2020: https://timchurches.github.io/blog/posts/2020-02-18-analysing-covid-19-2019-ncov-outbreak-data-with-r-part-1/#estimating-changes-in-the-effective-reproduction-number

The code is slightly changed, some graphs are tweaked. All in all this is supposed to help scientists as well as non-scientists to gain insights and conduct their own analysis of the situation. The main code is found in covidAnalysis.R. I will try to show some results here on the main page. Please, feel free to contribute.

Disclaimer

I am not a medical doctor, I am only a data-dude who wants to help citizen data scientist to stay informed and check the numbers we are confronted with every day. If you, on the other hand, are someone who understands more about epidemiology, feel free to use all you find here. Remember, most of it is presented in a much better way by Tim Churches.

Data Acquisition

Data are pulled from JHU Git Hub archive https://raw.githubusercontent.com/CSSEGISandData/.

For Germany the curve for example looks like this:

Some people are interested in the growth of cases after for examople the 100th case. In most cases I see those figures with absolute numbers. I think this is kind of missleading. Using the the wppExplorer package I was able to obtain the total number of people in each european country.

Development of cases in european countries after reaching 100 infections -Only mainland provinces are considered (e.g. only France not St.Martin)

Implemented Features

As to this day 2020-03-22only simple models are considered.

Linear modelling

A simple liner model is fit can easily be fitted to the log-transformed data:

myLinearModel = lm(log(cumulative_cases) ~ myDay,
                   datLong %>%
                   filter(Date >= as.Date("2020-02-24"))
                   )
summary(myLinearModel)
## 
## Call:
## lm(formula = log(cumulative_cases) ~ myDay, data = datLong %>% 
##     filter(Date >= as.Date("2020-02-24")))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.43240 -0.14867  0.01961  0.11401  0.41033 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.22839    0.22299  -23.45   <2e-16 ***
## myDay        0.28313    0.00522   54.24   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2113 on 25 degrees of freedom
## Multiple R-squared:  0.9916, Adjusted R-squared:  0.9912 
## F-statistic:  2942 on 1 and 25 DF,  p-value: < 2.2e-16
## `geom_smooth()` using formula 'y ~ x'

With a linear model crude predictions can be made:

SIR modelling

A more sophisticated way to model the outbreak can be performed by applying the SIR-Model (Susceptible Infectious Recovered). The model is based on an ODE system. See the code and Tim Churches’ posts for more details. However, the following can be obtained: